Goto

Collaborating Authors

 listing 1


Structural Enforcement of Statistical Rigor in AI-Driven Discovery: A Functional Architecture

Sargsyan, Karen

arXiv.org Artificial Intelligence

Sequential statistical protocols require meticulous state management and robust error handling -- challenges naturally suited to functional programming. We present a functional architecture for structural enforcement of statistical rigor in automated research systems (AI-Scientists). These LLM-driven systems risk generating spurious discoveries through dynamic hypothesis testing. We introduce the Research monad, a Haskell eDSL that enforces sequential statistical protocols (e.g., Online FDR (false discovery rate) control) using a monad transformer stack. To address risks in hybrid architectures where LLMs generate imperative code, we employ Declarative Scaffolding -- generating rigid harnesses that structurally constrain execution and prevent methodological errors like data leakage. We validate this approach through large-scale simulation (N=2000 hypotheses) and an end-to-end case study, demonstrating essential defense-in-depth for automated science integrity.


STAF: Leveraging LLMs for Automated Attack Tree-Based Security Test Generation

Khule, Tanmay, Marksteiner, Stefan, Alguindigue, Jose, Fuchs, Hannes, Fischmeister, Sebastian, Narayan, Apurva

arXiv.org Artificial Intelligence

In modern automotive development, security testing is critical for safeguarding systems against increasingly advanced threats. Attack trees are widely used to systematically represent potential attack vectors, but generating comprehensive test cases from these trees remains a labor-intensive, error-prone task that has seen limited automation in the context of testing vehicular systems. This paper introduces STAF (Security Test Automation Framework), a novel approach to automating security test case generation. Leveraging Large Language Models (LLMs) and a four-step self-corrective Retrieval-Augmented Generation (RAG) framework, STAF automates the generation of executable security test cases from attack trees, providing an end-to-end solution that encompasses the entire attack surface. We particularly show the elements and processes needed to provide an LLM to actually produce sensible and executable automotive security test suites, along with the integration with an automated testing framework. We further compare our tailored approach with general purpose (vanilla) LLMs and the performance of different LLMs (namely GPT-4.1 and DeepSeek) using our approach. We also demonstrate the method of our operation step-by-step in a concrete case study. Our results show significant improvements in efficiency, accuracy, scalability, and easy integration in any workflow, marking a substantial advancement in automating automotive security testing methodologies. Using TARAs as an input for verfication tests, we create synergies by connecting two vital elements of a secure automotive development process.


A Verification Methodology for Safety Assurance of Robotic Autonomous Systems

Adam, Mustafa, Anisi, David A., Ribeiro, Pedro

arXiv.org Artificial Intelligence

Autonomous robots deployed in shared human environments, such as agricultural settings, require rigorous safety assurance to meet both functional reliability and regulatory compliance. These systems must operate in dynamic, unstructured environments, interact safely with humans, and respond effectively to a wide range of potential hazards. This paper presents a verification workflow for the safety assurance of an autonomous agricultural robot, covering the entire development life-cycle, from concept study and design to runtime verification. The outlined methodology begins with a systematic hazard analysis and risk assessment to identify potential risks and derive corresponding safety requirements. A formal model of the safety controller is then developed to capture its behaviour and verify that the controller satisfies the specified safety properties with respect to these requirements. The proposed approach is demonstrated on a field robot operating in an agricultural setting. The results show that the methodology can be effectively used to verify safety-critical properties and facilitate the early identification of design issues, contributing to the development of safer robots and autonomous systems.


GenAI-based test case generation and execution in SDV platform

Zyberaj, Denesa, Mazur, Lukasz, Petrovic, Nenad, Verma, Pankhuri, Hirmer, Pascal, Slama, Dirk, Cheng, Xiangwei, Knoll, Alois

arXiv.org Artificial Intelligence

This paper introduces a GenAI-driven approach for automated test case generation, leveraging Large Language Models and Vision-Language Models to translate natural language requirements and system diagrams into structured Gherkin test cases. The methodology integrates Vehicle Signal Specification modeling to standardize vehicle signal definitions, improve compatibility across automotive subsystems, and streamline integration with third-party testing tools. Generated test cases are executed within the digital.auto playground, an open and vendor-neutral environment designed to facilitate rapid validation of software-defined vehicle functionalities. We evaluate our approach using the Child Presence Detection System use case, demonstrating substantial reductions in manual test specification effort and rapid execution of generated tests. Despite significant automation, the generation of test cases and test scripts still requires manual intervention due to current limitations in the GenAI pipeline and constraints of the digital.auto platform.


GPML: Graph Processing for Machine Learning

Jaber, Majed, Michel, Julien, Boutry, Nicolas, Parrend, Pierre

arXiv.org Artificial Intelligence

The dramatic increase of complex, multi-step, and rapidly evolving attacks in dynamic networks involves advanced cyber-threat detectors. The GPML (Graph Processing for Machine Learning) library addresses this need by transforming raw network traffic traces into graph representations, enabling advanced insights into network behaviors. The library provides tools to detect anomalies in interaction and community shifts in dynamic networks. GPML supports community and spectral metrics extraction, enhancing both real-time detection and historical forensics analysis. This library supports modern cybersecurity challenges with a robust, graph-based approach.


WebChecker: A Versatile EVL Plugin for Validating HTML Pages with Bootstrap Frameworks

Cherukuri, Milind

arXiv.org Artificial Intelligence

WebChecker is a plugin for Epsilon Validation Language (EVL), designed to validate both static and dynamic HTML pages utilizing frameworks like Bootstrap. By employing configurable EVL constraints, WebChecker enforces implicit rules governing HTML and CSS frameworks. The effectiveness of the plugin is demonstrated through its application on Bootstrap, the widely adopted HTML, CSS, and JavaScript framework. WebChecker comes with a set of EVL constraints to assess Bootstrap based web pages. To substantiate our claims, I present an illustrative example featuring two solutions that effectively enforce implicit rules.


Accessible Smart Contracts Verification: Synthesizing Formal Models with Tamed LLMs

Corazza, Jan, Gavran, Ivan, Moreira, Gabriela, Neider, Daniel

arXiv.org Artificial Intelligence

When blockchain systems are said to be trustless, what this really means is that all the trust is put into software. Thus, there are strong incentives to ensure blockchain software is correct -- vulnerabilities here cost millions and break businesses. One of the most powerful ways of establishing software correctness is by using formal methods. Approaches based on formal methods, however, induce a significant overhead in terms of time and expertise required to successfully employ them. Our work addresses this critical disadvantage by automating the creation of a formal model -- a mathematical abstraction of the software system -- which is often a core task when employing formal methods. We perform model synthesis in three phases: we first transpile the code into model stubs; then we "fill in the blanks" using a large language model (LLM); finally, we iteratively repair the generated model, on both syntactical and semantical level. In this way, we significantly reduce the amount of time necessary to create formal models and increase accessibility of valuable software verification methods that rely on them. The practical context of our work was reducing the time-to-value of using formal models for correctness audits of smart contracts.


Cross--layer Formal Verification of Robotic Systems

Raïs, Sylvain, Brunel, Julien, Doose, David, Herbreteau, Frédéric

arXiv.org Artificial Intelligence

Robotic systems are widely used to interact with humans or to perform critical tasks. As a result, it is imperative to provide guarantees about their behavior. Due to the modularity and complexity of robotic systems, their design and verification are often divided into several layers. However, some system properties can only be investigated by considering multiple layers simultaneously. We propose a cross-layer verification method to verify the expected properties of concrete robotic systems. Our method verifies one layer using abstractions of other layers. We propose two approaches: refining the models of the abstract layers and refining the property under verification. A combination of these two approaches seems to be the most promising to ensure model genericity and to avoid the state-space explosion problem.


Process Trace Querying using Knowledge Graphs and Notation3

Van Woensel, William

arXiv.org Artificial Intelligence

In process mining, a log exploration step allows making sense of the event traces; e.g., identifying event patterns and illogical traces, and gaining insight into their variability. To support expressive log exploration, the event log can be converted into a Knowledge Graph (KG), which can then be queried using general-purpose languages. We explore the creation of semantic KG using the Resource Description Framework (RDF) as a data model, combined with the general-purpose Notation3 (N3) rule language for querying. We show how typical trace querying constraints, inspired by the state of the art, can be implemented in N3. We convert case- and object-centric event logs into a trace-based semantic KG; OCEL2 logs are hereby "flattened" into traces based on object paths through the KG. This solution offers (a) expressivity, as queries can instantiate constraints in multiple ways and arbitrarily constrain attributes and relations (e.g., actors, resources); (b) flexibility, as OCEL2 event logs can be serialized as traces in arbitrary ways based on the KG; and (c) extensibility, as others can extend our library by leveraging the same implementation patterns.


CFaults: Model-Based Diagnosis for Fault Localization in C Programs with Multiple Test Cases

Orvalho, Pedro, Janota, Mikoláš, Manquinho, Vasco

arXiv.org Artificial Intelligence

Debugging is one of the most time-consuming and expensive tasks in software development. Several formula-based fault localization (FBFL) methods have been proposed, but they fail to guarantee a set of diagnoses across all failing tests or may produce redundant diagnoses that are not subset-minimal, particularly for programs with multiple faults. This paper introduces a novel fault localization approach for C programs with multiple faults. CFaults leverages Model-Based Diagnosis (MBD) with multiple observations and aggregates all failing test cases into a unified MaxSAT formula. Consequently, our method guarantees consistency across observations and simplifies the fault localization procedure. Experimental results on two benchmark sets of C programs, TCAS and C-Pack-IPAs, show that CFaults is faster than other FBFL approaches like BugAssist and SNIPER. Moreover, CFaults only generates subset-minimal diagnoses of faulty statements, whereas the other approaches tend to enumerate redundant diagnoses.